[VPlan] Enable vectorization of early-exit loops with unit-stride fault-only-first loads by arcbbb · Pull Request #151300 · llvm/llvm-project

arcbbb · 2025-07-30T09:54:24Z

Following #152422, this patch enables auto-vectorization of early-exit loops containing a single potentially faulting, unit-stride load by using the vp.load.ff intrinsic introduced in #128593.

Key changes:

Add VPWidenFFLoadRecipe that produces two results: the loaded vector and VL (the number of lanes actually loaded).
Use VL to ensure correctness:
- Step the induction variable by VL (variable-length stepping).
- Cap AVL to min(VF, remainder) to avoid over-reads at last iteration.
- Compute the early-exit condition from VL by replacing AnyOf with vp.reduce.or to avoid branch-on-poison.
Introduce two transforms:
- adjustFFLoadEarlyExitForPoisonSafety: rewrites the exit condition (AnyOf → vp.reduce.or(VL)) and sets AVL = min(VF, remainder).
- convertFFLoadEarlyExitToVLStepping: after region dissolution, converts early-exit loops to step by VL.

Limitations:

Supports a single potentially faulting load with unit stride.
Interleave count (IC) must be 1.

stacks on top of #165218.

arcbbb · 2025-07-30T09:54:44Z

Regarding EVL propagation, I just noticed that header mask handling is fixed in #150202. This fix will need to be incorporated accordingly.

github-actions · 2025-07-30T09:57:31Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/include/llvm/CodeGen/SelectionDAG.h

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

alexey-bataev · 2025-07-30T13:01:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Check this first?

Updated. Thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

david-arm · 2025-07-30T13:13:43Z

I think it would be good for reviewers if this patch was split up into several parts, probably roughly in this order:

Codegen specific parts related to the RISCV intrinsic,
A standalone loop vectorisation legality patch similar to [LV] Add initial legality checks for early exit loops with side effects #145663,
A loop vectoriser patch that starts vectorising the loops you're interested in.

arcbbb · 2025-08-01T13:05:52Z

I think it would be good for reviewers if this patch was split up into several parts, probably roughly in this order:

Codegen specific parts related to the RISCV intrinsic,

A standalone loop vectorisation legality patch similar to [LV] Add initial legality checks for ee loops with stores #145663,

A loop vectoriser patch that starts vectorising the loops you're interested in.

Thanks for outlining the steps. That’s very helpful! I’ll follow this order for splitting up the patch.

lukel97 · 2025-09-02T04:50:05Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7749,6 +7758,12 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
    Builder.insert(VectorPtr);
    Ptr = VectorPtr;
  }
+  if (Legal->getSpeculativeLoads().contains(I)) {
+    auto *Load = dyn_cast<LoadInst>(I);
+    return new VPWidenFFLoadRecipe(*Load, Ptr, Mask, VPIRMetadata(*Load, LVer),


Just making a note for later. I think it would be good to avoid having a dead VPWidenFFLoadRecipe when there's no non-EVL version of vp.load.ff.

Can we instead just have one VPWidenFFLoadRecipe and here pass VF as the EVL argument? optimizeMaskToEVL can then set the EVL from the header mask later.

Updated! After rebasing, this patch supports WidenFFLoad in non–tail-folded mode only. To tail-fold early-exit loops and FFLoad support in the EVL transform will have to be addressed in a separate patch.

This patch splits out the legality checks from PR #151300, following the landing of PR #128593. It is a step toward supporting vectorization of early-exit loops that contain potentially faulting loads. In this commit, an early-exit loop is considered legal for vectorization if it satisfies the following criteria: 1. it is a read-only loop. 2. all potentially faulting loads are unit-stride, which is the only type currently supported by vp.load.ff.

llvmbot · 2025-09-21T01:43:02Z

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Shih-Po Hung (arcbbb)

Changes

Following #152422, this patch enables auto-vectorization of early-exit loops containing a single potentially faulting, unit-stride load by using the vp.load.ff intrinsic introduced in #128593.

Key changes:

Add VPWidenFFLoadRecipe that produces two results: the loaded vector and VL (the number of lanes actually loaded).
Use VL to ensure correctness:
- Step the induction variable by VL (variable-length stepping).
- Cap AVL to min(VF, remainder) to avoid over-reads at last iteration.
- Compute the early-exit condition from VL by replacing AnyOf with vp.reduce.or to avoid branch-on-poison.
Introduce two transforms:
- adjustFFLoadEarlyExitForPoisonSafety: rewrites the exit condition (AnyOf → vp.reduce.or(VL)) and sets AVL = min(VF, remainder).
- convertFFLoadEarlyExitToVLStepping: after region dissolution, converts early-exit loops to step by VL.

Limitations:

Supports a single potentially faulting load with unit stride.
Interleave count (IC) must be 1.

Patch is 34.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151300.diff

9 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+41-1)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+45)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+3-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+43)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+96)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+11)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+3)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+2-2)
(added) llvm/test/Transforms/LoopVectorize/RISCV/find.ll (+236)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 1d3cffa2b61bf..e28d4c45d4ab8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -393,6 +393,12 @@ static cl::opt<bool> EnableEarlyExitVectorization(
     cl::desc(
         "Enable vectorization of early exit loops with uncountable exits."));
 
+static cl::opt<bool>
+    EnableEarlyExitWithFFLoads("enable-early-exit-with-ffload", cl::init(false),
+                               cl::Hidden,
+                               cl::desc("Enable vectorization of early-exit "
+                                        "loops with fault-only-first loads."));
+
 static cl::opt<bool> ConsiderRegPressure(
     "vectorizer-consider-reg-pressure", cl::init(false), cl::Hidden,
     cl::desc("Discard VFs if their register pressure is too high."));
@@ -3507,6 +3513,15 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
     return FixedScalableVFPair::getNone();
   }
 
+  if (!Legal->getPotentiallyFaultingLoads().empty() && UserIC > 1) {
+    reportVectorizationFailure("Auto-vectorization of loops with potentially "
+                               "faulting loads is not supported when the "
+                               "interleave count is more than 1",
+                               "CantInterleaveLoopWithPotentiallyFaultingLoads",
+                               ORE, TheLoop);
+    return FixedScalableVFPair::getNone();
+  }
+
   ScalarEvolution *SE = PSE.getSE();
   ElementCount TC = getSmallConstantTripCount(SE, TheLoop);
   unsigned MaxTC = PSE.getSmallConstantMaxTripCount();
@@ -4076,6 +4091,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       case VPDef::VPReductionPHISC:
       case VPDef::VPInterleaveEVLSC:
       case VPDef::VPInterleaveSC:
+      case VPDef::VPWidenFFLoadSC:
       case VPDef::VPWidenLoadEVLSC:
       case VPDef::VPWidenLoadSC:
       case VPDef::VPWidenStoreEVLSC:
@@ -4550,6 +4566,10 @@ LoopVectorizationPlanner::selectInterleaveCount(VPlan &Plan, ElementCount VF,
   if (!Legal->isSafeForAnyVectorWidth())
     return 1;
 
+  // No interleaving for potentially faulting loads.
+  if (!Legal->getPotentiallyFaultingLoads().empty())
+    return 1;
+
   // We don't attempt to perform interleaving for loops with uncountable early
   // exits because the VPInstruction::AnyOf code cannot currently handle
   // multiple parts.
@@ -7216,6 +7236,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   // Regions are dissolved after optimizing for VF and UF, which completely
   // removes unneeded loop regions first.
   VPlanTransforms::dissolveLoopRegions(BestVPlan);
+
+  VPlanTransforms::convertFFLoadEarlyExitToVLStepping(BestVPlan);
+
   // Canonicalize EVL loops after regions are dissolved.
   VPlanTransforms::canonicalizeEVLLoops(BestVPlan);
   VPlanTransforms::materializeBackedgeTakenCount(BestVPlan, VectorPH);
@@ -7598,6 +7621,10 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
     Builder.insert(VectorPtr);
     Ptr = VectorPtr;
   }
+  if (Legal->getPotentiallyFaultingLoads().contains(I))
+    return new VPWidenFFLoadRecipe(*cast<LoadInst>(I), Ptr, &Plan.getVF(), Mask,
+                                   VPIRMetadata(*I, LVer), I->getDebugLoc());
+
   if (LoadInst *Load = dyn_cast<LoadInst>(I))
     return new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
                                  VPIRMetadata(*Load, LVer), I->getDebugLoc());
@@ -8632,6 +8659,10 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
       if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
         Old2New[SingleDef] = Recipe->getVPSingleValue();
+      } else if (isa<VPWidenFFLoadRecipe>(Recipe)) {
+        VPValue *Data = Recipe->getVPValue(0);
+        SingleDef->replaceAllUsesWith(Data);
+        Old2New[SingleDef] = Data;
       } else {
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
@@ -8679,6 +8710,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // Adjust the recipes for any inloop reductions.
   adjustRecipesForReductions(Plan, RecipeBuilder, Range.Start);
 
+  VPlanTransforms::adjustFFLoadEarlyExitForPoisonSafety(*Plan);
+
   // Apply mandatory transformation to handle FP maxnum/minnum reduction with
   // NaNs if possible, bail out otherwise.
   if (!VPlanTransforms::runPass(VPlanTransforms::handleMaxMinNumReductions,
@@ -9869,7 +9902,14 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     return false;
   }
 
-  if (!LVL.getPotentiallyFaultingLoads().empty()) {
+  if (EnableEarlyExitWithFFLoads) {
+    if (LVL.getPotentiallyFaultingLoads().size() > 1) {
+      reportVectorizationFailure("Auto-vectorization of loops with more than 1 "
+                                 "potentially faulting load is not enabled",
+                                 "MoreThanOnePotentiallyFaultingLoad", ORE, L);
+      return false;
+    }
+  } else if (!LVL.getPotentiallyFaultingLoads().empty()) {
     reportVectorizationFailure("Auto-vectorization of loops with potentially "
                                "faulting load is not supported",
                                "PotentiallyFaultingLoadsNotSupported", ORE, L);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index f79855f7e2c5f..6e28c95ca601a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -563,6 +563,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPInterleaveEVLSC:
     case VPRecipeBase::VPInterleaveSC:
     case VPRecipeBase::VPIRInstructionSC:
+    case VPRecipeBase::VPWidenFFLoadSC:
     case VPRecipeBase::VPWidenLoadEVLSC:
     case VPRecipeBase::VPWidenLoadSC:
     case VPRecipeBase::VPWidenStoreEVLSC:
@@ -2811,6 +2812,13 @@ class LLVM_ABI_FOR_TEST VPReductionEVLRecipe : public VPReductionRecipe {
             ArrayRef<VPValue *>({R.getChainOp(), R.getVecOp(), &EVL}), CondOp,
             R.isOrdered(), DL) {}
 
+  VPReductionEVLRecipe(RecurKind RdxKind, FastMathFlags FMFs, VPValue *ChainOp,
+                       VPValue *VecOp, VPValue &EVL, VPValue *CondOp,
+                       bool IsOrdered, DebugLoc DL = DebugLoc::getUnknown())
+      : VPReductionRecipe(VPDef::VPReductionEVLSC, RdxKind, FMFs, nullptr,
+                          ArrayRef<VPValue *>({ChainOp, VecOp, &EVL}), CondOp,
+                          IsOrdered, DL) {}
+
   ~VPReductionEVLRecipe() override = default;
 
   VPReductionEVLRecipe *clone() override {
@@ -3159,6 +3167,7 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
   static inline bool classof(const VPRecipeBase *R) {
     return R->getVPDefID() == VPRecipeBase::VPWidenLoadSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenStoreSC ||
+           R->getVPDefID() == VPRecipeBase::VPWidenFFLoadSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenLoadEVLSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenStoreEVLSC;
   }
@@ -3240,6 +3249,42 @@ struct LLVM_ABI_FOR_TEST VPWidenLoadRecipe final : public VPWidenMemoryRecipe,
   }
 };
 
+/// A recipe for widening loads using fault-only-first intrinsics.
+/// Produces two results: (1) the loaded data, and (2) the index of the first
+/// non-dereferenceable lane, or VF if all lanes are successfully read.
+struct VPWidenFFLoadRecipe final : public VPWidenMemoryRecipe, public VPValue {
+  VPWidenFFLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *VF, VPValue *Mask,
+                      const VPIRMetadata &Metadata, DebugLoc DL)
+      : VPWidenMemoryRecipe(VPDef::VPWidenFFLoadSC, Load, {Addr, VF},
+                            /*Consecutive*/ true, /*Reverse*/ false, Metadata,
+                            DL),
+        VPValue(this, &Load) {
+    new VPValue(nullptr, this); // Index of the first lane that faults.
+    setMask(Mask);
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenFFLoadSC);
+
+  /// Return the VF operand.
+  VPValue *getVF() const { return getOperand(1); }
+  void setVF(VPValue *V) { setOperand(1, V); }
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return Op == getVF() || Op == getAddr();
+  }
+};
+
 /// A recipe for widening load operations with vector-predication intrinsics,
 /// using the address to load from, the explicit vector length and an optional
 /// mask.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 46ab7712e2671..684dbd25597e3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -188,8 +188,9 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPWidenCallRecipe *R) {
 }
 
 Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPWidenMemoryRecipe *R) {
-  assert((isa<VPWidenLoadRecipe, VPWidenLoadEVLRecipe>(R)) &&
-         "Store recipes should not define any values");
+  assert(
+      (isa<VPWidenLoadRecipe, VPWidenFFLoadRecipe, VPWidenLoadEVLRecipe>(R)) &&
+      "Store recipes should not define any values");
   return cast<LoadInst>(&R->getIngredient())->getType();
 }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 8e9c3db50319f..3da8613a1d3cc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -73,6 +73,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
   case VPReductionPHISC:
   case VPScalarIVStepsSC:
   case VPPredInstPHISC:
+  case VPWidenFFLoadSC:
     return false;
   case VPBlendSC:
   case VPReductionEVLSC:
@@ -107,6 +108,7 @@ bool VPRecipeBase::mayReadFromMemory() const {
     return cast<VPInstruction>(this)->opcodeMayReadOrWriteFromMemory();
   case VPWidenLoadEVLSC:
   case VPWidenLoadSC:
+  case VPWidenFFLoadSC:
     return true;
   case VPReplicateSC:
     return cast<Instruction>(getVPSingleValue()->getUnderlyingValue())
@@ -3409,6 +3411,47 @@ void VPWidenLoadRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
+void VPWidenFFLoadRecipe::execute(VPTransformState &State) {
+  Type *ScalarDataTy = getLoadStoreType(&Ingredient);
+  auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+
+  auto &Builder = State.Builder;
+  State.setDebugLocFrom(getDebugLoc());
+
+  Value *VL = State.get(getVF(), VPLane(0));
+  Type *I32Ty = Builder.getInt32Ty();
+  VL = Builder.CreateZExtOrTrunc(VL, I32Ty);
+  Value *Addr = State.get(getAddr(), true);
+  Value *Mask = nullptr;
+  if (VPValue *VPMask = getMask())
+    Mask = State.get(VPMask);
+  else
+    Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
+  CallInst *NewLI =
+      Builder.CreateIntrinsic(Intrinsic::vp_load_ff, {DataTy, Addr->getType()},
+                              {Addr, Mask, VL}, nullptr, "vp.op.load.ff");
+  NewLI->addParamAttr(
+      0, Attribute::getWithAlignment(NewLI->getContext(), Alignment));
+  applyMetadata(*NewLI);
+  Value *V = cast<Instruction>(Builder.CreateExtractValue(NewLI, 0));
+  Value *NewVL = Builder.CreateExtractValue(NewLI, 1);
+  State.set(getVPValue(0), V);
+  State.set(getVPValue(1), NewVL, /*NeedsScalar=*/true);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenFFLoadRecipe::print(raw_ostream &O, const Twine &Indent,
+                                VPSlotTracker &SlotTracker) const {
+  O << Indent << "WIDEN ";
+  printAsOperand(O, SlotTracker);
+  O << ", ";
+  getVPValue(1)->printAsOperand(O, SlotTracker);
+  O << " = vp.load.ff ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
 /// Use all-true mask for reverse rather than actual mask, as it avoids a
 /// dependence w/o affecting the result.
 static Instruction *createReverseEVL(IRBuilderBase &Builder, Value *Operand,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 1f6b85270607e..7e78cb6ed02ac 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2760,6 +2760,102 @@ void VPlanTransforms::addExplicitVectorLength(
   Plan.setUF(1);
 }
 
+void VPlanTransforms::adjustFFLoadEarlyExitForPoisonSafety(VPlan &Plan) {
+  VPBasicBlock *Header = Plan.getVectorLoopRegion()->getEntryBasicBlock();
+  VPWidenFFLoadRecipe *LastFFLoad = nullptr;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getVectorLoopRegion())))
+    for (VPRecipeBase &R : *VPBB)
+      if (auto *Load = dyn_cast<VPWidenFFLoadRecipe>(&R)) {
+        assert(!LastFFLoad && "Only one FFLoad is supported");
+        LastFFLoad = Load;
+      }
+
+  // Skip if no FFLoad.
+  if (!LastFFLoad)
+    return;
+
+  // Ensure FFLoad does not read past the remainder in the last iteration.
+  // Set AVL to min(VF, remainder).
+  VPBuilder Builder(Header, Header->getFirstNonPhi());
+  VPValue *Remainder = Builder.createNaryOp(
+      Instruction::Sub, {&Plan.getVectorTripCount(), Plan.getCanonicalIV()});
+  VPValue *Cmp =
+      Builder.createICmp(CmpInst::ICMP_ULE, &Plan.getVF(), Remainder);
+  VPValue *AVL = Builder.createSelect(Cmp, &Plan.getVF(), Remainder);
+  LastFFLoad->setVF(AVL);
+
+  // To prevent branch-on-poison, rewrite the early-exit condition to
+  // VPReductionEVLRecipe. Expected pattern here is:
+  //   EMIT vp<%alt.exit.cond> = AnyOf
+  //   EMIT vp<%exit.cond> = or vp<%alt.exit.cond>, vp<%main.exit.cond>
+  //   EMIT branch-on-cond vp<%exit.cond>
+  auto *ExitingLatch = cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getExiting());
+  auto *LatchExitingBr = cast<VPInstruction>(ExitingLatch->getTerminator());
+
+  VPValue *VPAnyOf = nullptr;
+  VPValue *VecOp = nullptr;
+  assert(
+      match(LatchExitingBr,
+            m_BranchOnCond(m_BinaryOr(m_VPValue(VPAnyOf), m_VPValue()))) &&
+      match(VPAnyOf, m_VPInstruction<VPInstruction::AnyOf>(m_VPValue(VecOp))) &&
+      "unexpected exiting sequence in early exit loop");
+
+  VPValue *OpVPEVLI32 = LastFFLoad->getVPValue(1);
+  VPValue *Mask = LastFFLoad->getMask();
+  FastMathFlags FMF;
+  auto *I1Ty = Type::getInt1Ty(Plan.getContext());
+  VPValue *VPZero = Plan.getOrAddLiveIn(ConstantInt::get(I1Ty, 0));
+  DebugLoc DL = VPAnyOf->getDefiningRecipe()->getDebugLoc();
+  auto *NewAnyOf =
+      new VPReductionEVLRecipe(RecurKind::Or, FMF, VPZero, VecOp, *OpVPEVLI32,
+                               Mask, /*IsOrdered*/ false, DL);
+  NewAnyOf->insertBefore(VPAnyOf->getDefiningRecipe());
+  VPAnyOf->replaceAllUsesWith(NewAnyOf);
+
+  // Using FirstActiveLane in the early-exit block is safe,
+  // exiting conditions guarantees at least one valid lane precedes
+  // any poisoned lanes.
+}
+
+void VPlanTransforms::convertFFLoadEarlyExitToVLStepping(VPlan &Plan) {
+  // Find loop header by locating VPWidenFFLoadRecipe.
+  VPWidenFFLoadRecipe *LastFFLoad = nullptr;
+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getEntry())))
+    for (VPRecipeBase &R : *VPBB)
+      if (auto *Load = dyn_cast<VPWidenFFLoadRecipe>(&R)) {
+        assert(!LastFFLoad && "Only one FFLoad is supported");
+        LastFFLoad = Load;
+      }
+
+  // Skip if no FFLoad.
+  if (!LastFFLoad)
+    return;
+
+  VPBasicBlock *HeaderVPBB = LastFFLoad->getParent();
+  // Replace IVStep (VFxUF) with returned VL from FFLoad.
+  auto *CanonicalIV = cast<VPPhi>(&*HeaderVPBB->begin());
+  VPValue *Backedge = CanonicalIV->getIncomingValue(1);
+  assert(match(Backedge, m_c_Add(m_Specific(CanonicalIV),
+                                 m_Specific(&Plan.getVFxUF()))) &&
+         "Unexpected canonical iv");
+  VPRecipeBase *CanonicalIVIncrement = Backedge->getDefiningRecipe();
+  VPValue *OpVPEVLI32 = LastFFLoad->getVPValue(1);
+  VPBuilder Builder(HeaderVPBB, HeaderVPBB->getFirstNonPhi());
+  Builder.setInsertPoint(CanonicalIVIncrement);
+  auto *TC = Plan.getTripCount();
+  Type *CanIVTy = TC->isLiveIn()
+                      ? TC->getLiveInIRValue()->getType()
+                      : cast<VPExpandSCEVRecipe>(TC)->getSCEV()->getType();
+  auto *I32Ty = Type::getInt32Ty(Plan.getContext());
+  VPValue *OpVPEVL = Builder.createScalarZExtOrTrunc(
+      OpVPEVLI32, CanIVTy, I32Ty, CanonicalIVIncrement->getDebugLoc());
+
+  CanonicalIVIncrement->setOperand(1, OpVPEVL);
+}
+
 void VPlanTransforms::canonicalizeEVLLoops(VPlan &Plan) {
   // Find EVL loop entries by locating VPEVLBasedIVPHIRecipe.
   // There should be only one EVL PHI in the entire plan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 69452a7e37572..bc5ce3bc43e76 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -269,6 +269,17 @@ struct VPlanTransforms {
   ///      (branch-on-cond eq AVLNext, 0)
   static void canonicalizeEVLLoops(VPlan &Plan);
 
+  /// Applies to early-exit loops that use FFLoad. FFLoad may yield fewer active
+  /// lanes than VF. To prevent branch-on-poison and over-reads past the vector
+  /// trip count, use the returned VL for both stepping and exit computation.
+  /// Implemented by:
+  ///  - adjustFFLoadEarlyExitForPoisonSafety: replace AnyOf with vp.reduce.or over
+  ///    the first VL lanes; set AVL = min(VF, remainder).
+  ///  - convertFFLoadEarlyExitToVLStepping: after region dissolution, convert
+  ///    early-exit loops to variable-length stepping.
+  static void adjustFFLoadEarlyExitForPoisonSafety(VPlan &Plan);
+  static void convertFFLoadEarlyExitToVLStepping(VPlan &Plan);
+
   /// Lower abstract recipes to concrete ones, that can be codegen'd.
   static void convertToConcreteRecipes(VPlan &Plan);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 0678bc90ef4b5..b2bc430a09686 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -40,6 +40,7 @@ class VPUser;
 class VPRecipeBase;
 class VPInterleaveBase;
 class VPPhiAccessors;
+class VPWidenFFLoadRecipe;
 
 // This is the base class of the VPlan Def/Use graph, used for modeling the data
 // flow into, within and out of the VPlan. VPValues can stand for live-ins
@@ -51,6 +52,7 @@ class LLVM_ABI_FOR_TEST VPValue {
   friend class VPInterleaveBase;
   friend class VPlan;
   friend class VPExpressionRecipe;
+  friend class VPWidenFFLoadRecipe;
 
   const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).
 
@@ -351,6 +353,7 @@ class VPDef {
     VPWidenCastSC,
     VPWidenGEPSC,
     VPWidenIntrinsicSC,
+    VPWidenFFLoadSC,
     VPWidenLoadEVLSC,
     VPWidenLoadSC,
     VPWidenStoreEVLSC,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index 92caa0b4e51d5..70e6e0d006eb6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
@@ -166,8 +166,8 @@ bool VPlanVerifier::verifyEVLRecipe(const VPInstruction &EVL) const {
           }
           return VerifyEVLUse(*R, 2);
         })
-        .Case<VPWidenLoadEVLRecipe, VPVectorEndPointerRecipe,
-              VPInterleaveEVLRecipe>(
+        .Case<VPWidenLoadEVLRecipe, VPWidenFFLoadRecipe,
+              VPVectorEndPointerRecipe, VPInterleaveEVLRecipe>(
             [&](const VPRecipeBase *R) { return VerifyEVLUse(*R, 1); })
         .Case<VPInstructionWithType>(
             [&](const VPInstructionWithType *S) { return VerifyEVLUse(*S, 0); })
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/find.ll b/llvm/test/Transforms/LoopVectorize/RISCV/find.ll
new file mode 100644
index 0000000000000..f734bd5f53c82
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/find.ll
@@ -0...
[truncated]

llvmbot · 2025-09-21T01:43:02Z

@llvm/pr-subscribers-backend-risc-v

Author: Shih-Po Hung (arcbbb)

Changes

Following #152422, this patch enables auto-vectorization of early-exit loops containing a single potentially faulting, unit-stride load by using the vp.load.ff intrinsic introduced in #128593.

Key changes:

Add VPWidenFFLoadRecipe that produces two results: the loaded vector and VL (the number of lanes actually loaded).
Use VL to ensure correctness:
- Step the induction variable by VL (variable-length stepping).
- Cap AVL to min(VF, remainder) to avoid over-reads at last iteration.
- Compute the early-exit condition from VL by replacing AnyOf with vp.reduce.or to avoid branch-on-poison.
Introduce two transforms:
- adjustFFLoadEarlyExitForPoisonSafety: rewrites the exit condition (AnyOf → vp.reduce.or(VL)) and sets AVL = min(VF, remainder).
- convertFFLoadEarlyExitToVLStepping: after region dissolution, converts early-exit loops to step by VL.

Limitations:

Supports a single potentially faulting load with unit stride.
Interleave count (IC) must be 1.

Patch is 34.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151300.diff

9 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+41-1)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+45)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+3-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+43)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+96)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+11)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+3)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (+2-2)
(added) llvm/test/Transforms/LoopVectorize/RISCV/find.ll (+236)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 1d3cffa2b61bf..e28d4c45d4ab8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -393,6 +393,12 @@ static cl::opt<bool> EnableEarlyExitVectorization(
     cl::desc(
         "Enable vectorization of early exit loops with uncountable exits."));
 
+static cl::opt<bool>
+    EnableEarlyExitWithFFLoads("enable-early-exit-with-ffload", cl::init(false),
+                               cl::Hidden,
+                               cl::desc("Enable vectorization of early-exit "
+                                        "loops with fault-only-first loads."));
+
 static cl::opt<bool> ConsiderRegPressure(
     "vectorizer-consider-reg-pressure", cl::init(false), cl::Hidden,
     cl::desc("Discard VFs if their register pressure is too high."));
@@ -3507,6 +3513,15 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
     return FixedScalableVFPair::getNone();
   }
 
+  if (!Legal->getPotentiallyFaultingLoads().empty() && UserIC > 1) {
+    reportVectorizationFailure("Auto-vectorization of loops with potentially "
+                               "faulting loads is not supported when the "
+                               "interleave count is more than 1",
+                               "CantInterleaveLoopWithPotentiallyFaultingLoads",
+                               ORE, TheLoop);
+    return FixedScalableVFPair::getNone();
+  }
+
   ScalarEvolution *SE = PSE.getSE();
   ElementCount TC = getSmallConstantTripCount(SE, TheLoop);
   unsigned MaxTC = PSE.getSmallConstantMaxTripCount();
@@ -4076,6 +4091,7 @@ static bool willGenerateVectors(VPlan &Plan, ElementCount VF,
       case VPDef::VPReductionPHISC:
       case VPDef::VPInterleaveEVLSC:
       case VPDef::VPInterleaveSC:
+      case VPDef::VPWidenFFLoadSC:
       case VPDef::VPWidenLoadEVLSC:
       case VPDef::VPWidenLoadSC:
       case VPDef::VPWidenStoreEVLSC:
@@ -4550,6 +4566,10 @@ LoopVectorizationPlanner::selectInterleaveCount(VPlan &Plan, ElementCount VF,
   if (!Legal->isSafeForAnyVectorWidth())
     return 1;
 
+  // No interleaving for potentially faulting loads.
+  if (!Legal->getPotentiallyFaultingLoads().empty())
+    return 1;
+
   // We don't attempt to perform interleaving for loops with uncountable early
   // exits because the VPInstruction::AnyOf code cannot currently handle
   // multiple parts.
@@ -7216,6 +7236,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   // Regions are dissolved after optimizing for VF and UF, which completely
   // removes unneeded loop regions first.
   VPlanTransforms::dissolveLoopRegions(BestVPlan);
+
+  VPlanTransforms::convertFFLoadEarlyExitToVLStepping(BestVPlan);
+
   // Canonicalize EVL loops after regions are dissolved.
   VPlanTransforms::canonicalizeEVLLoops(BestVPlan);
   VPlanTransforms::materializeBackedgeTakenCount(BestVPlan, VectorPH);
@@ -7598,6 +7621,10 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
     Builder.insert(VectorPtr);
     Ptr = VectorPtr;
   }
+  if (Legal->getPotentiallyFaultingLoads().contains(I))
+    return new VPWidenFFLoadRecipe(*cast<LoadInst>(I), Ptr, &Plan.getVF(), Mask,
+                                   VPIRMetadata(*I, LVer), I->getDebugLoc());
+
   if (LoadInst *Load = dyn_cast<LoadInst>(I))
     return new VPWidenLoadRecipe(*Load, Ptr, Mask, Consecutive, Reverse,
                                  VPIRMetadata(*Load, LVer), I->getDebugLoc());
@@ -8632,6 +8659,10 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
       if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
         Old2New[SingleDef] = Recipe->getVPSingleValue();
+      } else if (isa<VPWidenFFLoadRecipe>(Recipe)) {
+        VPValue *Data = Recipe->getVPValue(0);
+        SingleDef->replaceAllUsesWith(Data);
+        Old2New[SingleDef] = Data;
       } else {
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
@@ -8679,6 +8710,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // Adjust the recipes for any inloop reductions.
   adjustRecipesForReductions(Plan, RecipeBuilder, Range.Start);
 
+  VPlanTransforms::adjustFFLoadEarlyExitForPoisonSafety(*Plan);
+
   // Apply mandatory transformation to handle FP maxnum/minnum reduction with
   // NaNs if possible, bail out otherwise.
   if (!VPlanTransforms::runPass(VPlanTransforms::handleMaxMinNumReductions,
@@ -9869,7 +9902,14 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     return false;
   }
 
-  if (!LVL.getPotentiallyFaultingLoads().empty()) {
+  if (EnableEarlyExitWithFFLoads) {
+    if (LVL.getPotentiallyFaultingLoads().size() > 1) {
+      reportVectorizationFailure("Auto-vectorization of loops with more than 1 "
+                                 "potentially faulting load is not enabled",
+                                 "MoreThanOnePotentiallyFaultingLoad", ORE, L);
+      return false;
+    }
+  } else if (!LVL.getPotentiallyFaultingLoads().empty()) {
     reportVectorizationFailure("Auto-vectorization of loops with potentially "
                                "faulting load is not supported",
                                "PotentiallyFaultingLoadsNotSupported", ORE, L);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index f79855f7e2c5f..6e28c95ca601a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -563,6 +563,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPInterleaveEVLSC:
     case VPRecipeBase::VPInterleaveSC:
     case VPRecipeBase::VPIRInstructionSC:
+    case VPRecipeBase::VPWidenFFLoadSC:
     case VPRecipeBase::VPWidenLoadEVLSC:
     case VPRecipeBase::VPWidenLoadSC:
     case VPRecipeBase::VPWidenStoreEVLSC:
@@ -2811,6 +2812,13 @@ class LLVM_ABI_FOR_TEST VPReductionEVLRecipe : public VPReductionRecipe {
             ArrayRef<VPValue *>({R.getChainOp(), R.getVecOp(), &EVL}), CondOp,
             R.isOrdered(), DL) {}
 
+  VPReductionEVLRecipe(RecurKind RdxKind, FastMathFlags FMFs, VPValue *ChainOp,
+                       VPValue *VecOp, VPValue &EVL, VPValue *CondOp,
+                       bool IsOrdered, DebugLoc DL = DebugLoc::getUnknown())
+      : VPReductionRecipe(VPDef::VPReductionEVLSC, RdxKind, FMFs, nullptr,
+                          ArrayRef<VPValue *>({ChainOp, VecOp, &EVL}), CondOp,
+                          IsOrdered, DL) {}
+
   ~VPReductionEVLRecipe() override = default;
 
   VPReductionEVLRecipe *clone() override {
@@ -3159,6 +3167,7 @@ class LLVM_ABI_FOR_TEST VPWidenMemoryRecipe : public VPRecipeBase,
   static inline bool classof(const VPRecipeBase *R) {
     return R->getVPDefID() == VPRecipeBase::VPWidenLoadSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenStoreSC ||
+           R->getVPDefID() == VPRecipeBase::VPWidenFFLoadSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenLoadEVLSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenStoreEVLSC;
   }
@@ -3240,6 +3249,42 @@ struct LLVM_ABI_FOR_TEST VPWidenLoadRecipe final : public VPWidenMemoryRecipe,
   }
 };
 
+/// A recipe for widening loads using fault-only-first intrinsics.
+/// Produces two results: (1) the loaded data, and (2) the index of the first
+/// non-dereferenceable lane, or VF if all lanes are successfully read.
+struct VPWidenFFLoadRecipe final : public VPWidenMemoryRecipe, public VPValue {
+  VPWidenFFLoadRecipe(LoadInst &Load, VPValue *Addr, VPValue *VF, VPValue *Mask,
+                      const VPIRMetadata &Metadata, DebugLoc DL)
+      : VPWidenMemoryRecipe(VPDef::VPWidenFFLoadSC, Load, {Addr, VF},
+                            /*Consecutive*/ true, /*Reverse*/ false, Metadata,
+                            DL),
+        VPValue(this, &Load) {
+    new VPValue(nullptr, this); // Index of the first lane that faults.
+    setMask(Mask);
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenFFLoadSC);
+
+  /// Return the VF operand.
+  VPValue *getVF() const { return getOperand(1); }
+  void setVF(VPValue *V) { setOperand(1, V); }
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  /// Returns true if the recipe only uses the first lane of operand \p Op.
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return Op == getVF() || Op == getAddr();
+  }
+};
+
 /// A recipe for widening load operations with vector-predication intrinsics,
 /// using the address to load from, the explicit vector length and an optional
 /// mask.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 46ab7712e2671..684dbd25597e3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -188,8 +188,9 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPWidenCallRecipe *R) {
 }
 
 Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPWidenMemoryRecipe *R) {
-  assert((isa<VPWidenLoadRecipe, VPWidenLoadEVLRecipe>(R)) &&
-         "Store recipes should not define any values");
+  assert(
+      (isa<VPWidenLoadRecipe, VPWidenFFLoadRecipe, VPWidenLoadEVLRecipe>(R)) &&
+      "Store recipes should not define any values");
   return cast<LoadInst>(&R->getIngredient())->getType();
 }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 8e9c3db50319f..3da8613a1d3cc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -73,6 +73,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
   case VPReductionPHISC:
   case VPScalarIVStepsSC:
   case VPPredInstPHISC:
+  case VPWidenFFLoadSC:
     return false;
   case VPBlendSC:
   case VPReductionEVLSC:
@@ -107,6 +108,7 @@ bool VPRecipeBase::mayReadFromMemory() const {
     return cast<VPInstruction>(this)->opcodeMayReadOrWriteFromMemory();
   case VPWidenLoadEVLSC:
   case VPWidenLoadSC:
+  case VPWidenFFLoadSC:
     return true;
   case VPReplicateSC:
     return cast<Instruction>(getVPSingleValue()->getUnderlyingValue())
@@ -3409,6 +3411,47 @@ void VPWidenLoadRecipe::print(raw_ostream &O, const Twine &Indent,
 }
 #endif
 
+void VPWidenFFLoadRecipe::execute(VPTransformState &State) {
+  Type *ScalarDataTy = getLoadStoreType(&Ingredient);
+  auto *DataTy = VectorType::get(ScalarDataTy, State.VF);
+  const Align Alignment = getLoadStoreAlignment(&Ingredient);
+
+  auto &Builder = State.Builder;
+  State.setDebugLocFrom(getDebugLoc());
+
+  Value *VL = State.get(getVF(), VPLane(0));
+  Type *I32Ty = Builder.getInt32Ty();
+  VL = Builder.CreateZExtOrTrunc(VL, I32Ty);
+  Value *Addr = State.get(getAddr(), true);
+  Value *Mask = nullptr;
+  if (VPValue *VPMask = getMask())
+    Mask = State.get(VPMask);
+  else
+    Mask = Builder.CreateVectorSplat(State.VF, Builder.getTrue());
+  CallInst *NewLI =
+      Builder.CreateIntrinsic(Intrinsic::vp_load_ff, {DataTy, Addr->getType()},
+                              {Addr, Mask, VL}, nullptr, "vp.op.load.ff");
+  NewLI->addParamAttr(
+      0, Attribute::getWithAlignment(NewLI->getContext(), Alignment));
+  applyMetadata(*NewLI);
+  Value *V = cast<Instruction>(Builder.CreateExtractValue(NewLI, 0));
+  Value *NewVL = Builder.CreateExtractValue(NewLI, 1);
+  State.set(getVPValue(0), V);
+  State.set(getVPValue(1), NewVL, /*NeedsScalar=*/true);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenFFLoadRecipe::print(raw_ostream &O, const Twine &Indent,
+                                VPSlotTracker &SlotTracker) const {
+  O << Indent << "WIDEN ";
+  printAsOperand(O, SlotTracker);
+  O << ", ";
+  getVPValue(1)->printAsOperand(O, SlotTracker);
+  O << " = vp.load.ff ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
 /// Use all-true mask for reverse rather than actual mask, as it avoids a
 /// dependence w/o affecting the result.
 static Instruction *createReverseEVL(IRBuilderBase &Builder, Value *Operand,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 1f6b85270607e..7e78cb6ed02ac 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -2760,6 +2760,102 @@ void VPlanTransforms::addExplicitVectorLength(
   Plan.setUF(1);
 }
 
+void VPlanTransforms::adjustFFLoadEarlyExitForPoisonSafety(VPlan &Plan) {
+  VPBasicBlock *Header = Plan.getVectorLoopRegion()->getEntryBasicBlock();
+  VPWidenFFLoadRecipe *LastFFLoad = nullptr;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getVectorLoopRegion())))
+    for (VPRecipeBase &R : *VPBB)
+      if (auto *Load = dyn_cast<VPWidenFFLoadRecipe>(&R)) {
+        assert(!LastFFLoad && "Only one FFLoad is supported");
+        LastFFLoad = Load;
+      }
+
+  // Skip if no FFLoad.
+  if (!LastFFLoad)
+    return;
+
+  // Ensure FFLoad does not read past the remainder in the last iteration.
+  // Set AVL to min(VF, remainder).
+  VPBuilder Builder(Header, Header->getFirstNonPhi());
+  VPValue *Remainder = Builder.createNaryOp(
+      Instruction::Sub, {&Plan.getVectorTripCount(), Plan.getCanonicalIV()});
+  VPValue *Cmp =
+      Builder.createICmp(CmpInst::ICMP_ULE, &Plan.getVF(), Remainder);
+  VPValue *AVL = Builder.createSelect(Cmp, &Plan.getVF(), Remainder);
+  LastFFLoad->setVF(AVL);
+
+  // To prevent branch-on-poison, rewrite the early-exit condition to
+  // VPReductionEVLRecipe. Expected pattern here is:
+  //   EMIT vp<%alt.exit.cond> = AnyOf
+  //   EMIT vp<%exit.cond> = or vp<%alt.exit.cond>, vp<%main.exit.cond>
+  //   EMIT branch-on-cond vp<%exit.cond>
+  auto *ExitingLatch = cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getExiting());
+  auto *LatchExitingBr = cast<VPInstruction>(ExitingLatch->getTerminator());
+
+  VPValue *VPAnyOf = nullptr;
+  VPValue *VecOp = nullptr;
+  assert(
+      match(LatchExitingBr,
+            m_BranchOnCond(m_BinaryOr(m_VPValue(VPAnyOf), m_VPValue()))) &&
+      match(VPAnyOf, m_VPInstruction<VPInstruction::AnyOf>(m_VPValue(VecOp))) &&
+      "unexpected exiting sequence in early exit loop");
+
+  VPValue *OpVPEVLI32 = LastFFLoad->getVPValue(1);
+  VPValue *Mask = LastFFLoad->getMask();
+  FastMathFlags FMF;
+  auto *I1Ty = Type::getInt1Ty(Plan.getContext());
+  VPValue *VPZero = Plan.getOrAddLiveIn(ConstantInt::get(I1Ty, 0));
+  DebugLoc DL = VPAnyOf->getDefiningRecipe()->getDebugLoc();
+  auto *NewAnyOf =
+      new VPReductionEVLRecipe(RecurKind::Or, FMF, VPZero, VecOp, *OpVPEVLI32,
+                               Mask, /*IsOrdered*/ false, DL);
+  NewAnyOf->insertBefore(VPAnyOf->getDefiningRecipe());
+  VPAnyOf->replaceAllUsesWith(NewAnyOf);
+
+  // Using FirstActiveLane in the early-exit block is safe,
+  // exiting conditions guarantees at least one valid lane precedes
+  // any poisoned lanes.
+}
+
+void VPlanTransforms::convertFFLoadEarlyExitToVLStepping(VPlan &Plan) {
+  // Find loop header by locating VPWidenFFLoadRecipe.
+  VPWidenFFLoadRecipe *LastFFLoad = nullptr;
+
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_shallow(Plan.getEntry())))
+    for (VPRecipeBase &R : *VPBB)
+      if (auto *Load = dyn_cast<VPWidenFFLoadRecipe>(&R)) {
+        assert(!LastFFLoad && "Only one FFLoad is supported");
+        LastFFLoad = Load;
+      }
+
+  // Skip if no FFLoad.
+  if (!LastFFLoad)
+    return;
+
+  VPBasicBlock *HeaderVPBB = LastFFLoad->getParent();
+  // Replace IVStep (VFxUF) with returned VL from FFLoad.
+  auto *CanonicalIV = cast<VPPhi>(&*HeaderVPBB->begin());
+  VPValue *Backedge = CanonicalIV->getIncomingValue(1);
+  assert(match(Backedge, m_c_Add(m_Specific(CanonicalIV),
+                                 m_Specific(&Plan.getVFxUF()))) &&
+         "Unexpected canonical iv");
+  VPRecipeBase *CanonicalIVIncrement = Backedge->getDefiningRecipe();
+  VPValue *OpVPEVLI32 = LastFFLoad->getVPValue(1);
+  VPBuilder Builder(HeaderVPBB, HeaderVPBB->getFirstNonPhi());
+  Builder.setInsertPoint(CanonicalIVIncrement);
+  auto *TC = Plan.getTripCount();
+  Type *CanIVTy = TC->isLiveIn()
+                      ? TC->getLiveInIRValue()->getType()
+                      : cast<VPExpandSCEVRecipe>(TC)->getSCEV()->getType();
+  auto *I32Ty = Type::getInt32Ty(Plan.getContext());
+  VPValue *OpVPEVL = Builder.createScalarZExtOrTrunc(
+      OpVPEVLI32, CanIVTy, I32Ty, CanonicalIVIncrement->getDebugLoc());
+
+  CanonicalIVIncrement->setOperand(1, OpVPEVL);
+}
+
 void VPlanTransforms::canonicalizeEVLLoops(VPlan &Plan) {
   // Find EVL loop entries by locating VPEVLBasedIVPHIRecipe.
   // There should be only one EVL PHI in the entire plan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 69452a7e37572..bc5ce3bc43e76 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -269,6 +269,17 @@ struct VPlanTransforms {
   ///      (branch-on-cond eq AVLNext, 0)
   static void canonicalizeEVLLoops(VPlan &Plan);
 
+  /// Applies to early-exit loops that use FFLoad. FFLoad may yield fewer active
+  /// lanes than VF. To prevent branch-on-poison and over-reads past the vector
+  /// trip count, use the returned VL for both stepping and exit computation.
+  /// Implemented by:
+  ///  - adjustFFLoadEarlyExitForPoisonSafety: replace AnyOf with vp.reduce.or over
+  ///    the first VL lanes; set AVL = min(VF, remainder).
+  ///  - convertFFLoadEarlyExitToVLStepping: after region dissolution, convert
+  ///    early-exit loops to variable-length stepping.
+  static void adjustFFLoadEarlyExitForPoisonSafety(VPlan &Plan);
+  static void convertFFLoadEarlyExitToVLStepping(VPlan &Plan);
+
   /// Lower abstract recipes to concrete ones, that can be codegen'd.
   static void convertToConcreteRecipes(VPlan &Plan);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index 0678bc90ef4b5..b2bc430a09686 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -40,6 +40,7 @@ class VPUser;
 class VPRecipeBase;
 class VPInterleaveBase;
 class VPPhiAccessors;
+class VPWidenFFLoadRecipe;
 
 // This is the base class of the VPlan Def/Use graph, used for modeling the data
 // flow into, within and out of the VPlan. VPValues can stand for live-ins
@@ -51,6 +52,7 @@ class LLVM_ABI_FOR_TEST VPValue {
   friend class VPInterleaveBase;
   friend class VPlan;
   friend class VPExpressionRecipe;
+  friend class VPWidenFFLoadRecipe;
 
   const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).
 
@@ -351,6 +353,7 @@ class VPDef {
     VPWidenCastSC,
     VPWidenGEPSC,
     VPWidenIntrinsicSC,
+    VPWidenFFLoadSC,
     VPWidenLoadEVLSC,
     VPWidenLoadSC,
     VPWidenStoreEVLSC,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index 92caa0b4e51d5..70e6e0d006eb6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
@@ -166,8 +166,8 @@ bool VPlanVerifier::verifyEVLRecipe(const VPInstruction &EVL) const {
           }
           return VerifyEVLUse(*R, 2);
         })
-        .Case<VPWidenLoadEVLRecipe, VPVectorEndPointerRecipe,
-              VPInterleaveEVLRecipe>(
+        .Case<VPWidenLoadEVLRecipe, VPWidenFFLoadRecipe,
+              VPVectorEndPointerRecipe, VPInterleaveEVLRecipe>(
             [&](const VPRecipeBase *R) { return VerifyEVLUse(*R, 1); })
         .Case<VPInstructionWithType>(
             [&](const VPInstructionWithType *S) { return VerifyEVLUse(*S, 0); })
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/find.ll b/llvm/test/Transforms/LoopVectorize/RISCV/find.ll
new file mode 100644
index 0000000000000..f734bd5f53c82
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/find.ll
@@ -0...
[truncated]

arcbbb · 2025-09-21T02:22:06Z

Rebased after #152422 landed, and updated the PR description as well as title.

arcbbb · 2025-12-16T10:28:07Z

Trying this PR out on some llvm-test-suite benchmarks shows that the generated code always seems to generate LMUL 8 step vectors which is probably not great for performance:

So I don't think there's really much reason why we would want to emit non-tail folded early-exit loops if we can tail fold them eventually.

If performance is a concern, I’m considering replacing the active‑lane‑mask + reduce_or sequence with a vfirst intrinsic + icmp in CodeGenPrepare. The idea is to rewrite:

  %14 = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %11)
  %15 = select <vscale x 16 x i1> %14, <vscale x 16 x i1> %13, <vscale x 16 x i1> zeroinitializer
  %16 = freeze <vscale x 16 x i1> %15
  %17 = call i1 @llvm.vector.reduce.or.nxv16i1(<vscale x 16 x i1> %16)

into:

%tmp0 = call i32 @llvm.riscv.vfirst(%13, zeroinitializer, %11)
%tmp1 = icmp sge i32 %tmp0, 0

Would this be preferable?

I understand that this is supposed to be an incremental PR, but I think maybe a better ordering might be to start by supporting early exit loops with tail folding. I think this means we need to address the "variable header mask" TODO here:

I think we can do this if we replace the notion of a header mask with the notion of a per-block header mask. I'll see if I can create an issue to discuss some of the design of this more.

Thanks for flagging early‑exit loops with tail folding. That is also a nice-to-have and I am keen to see it.
We can tackle it separately. Hopefully it doesn’t block this PR!

If we have an early exit loop with non-dereferenceable loads after the exit, we currently bail: int z; for (int i = 0; i < N; i++) { if (x[i]) break; z = y[i]; } If the early exit block dominates the block containing these loads, we can predicate them with mask like for (int i = 0; i < N/VF; i++) { c[0..VF] = x[i..i+VF] z[0..VF] = y[i..i+VF], mask=c if (anyof(c)) break; } In VPlan terms, this is `icmp ult step-vector, (first-active-lane exit-cond)` VPlanPredicator can handle predicating these blocks, but in tryToBuildVPlanWithVPRecipes we first disconnect all early exits before the masks are introduced: // entry -> exiting -> ... -> latch // | // +-----> earlyexit VPlanTransforms::handleEarlyExits(*Plan); // entry -> exiting -> ... -> latch VPlanTransforms::introduceMasksAndLinearize(*Plan); This is needed to keep the region single entry/single exit, but it also means that there isn't any control flow by the time we want to add the masks: exiting: %earlyexitcond = ... // one successor (latch) latch: %exitcond = or (anyof %earlyexitcond), %origexitcond br %exitcond, entry, exit This patch propagates the information to VPlanPredicator that the successors should be predicated even though there isn't actually control flow with a new EarlyExit VPInstruction: exiting: %earlyexitcond = ... earlyexit (icmp ult step-vector, (first-active-lane %earlyexitcond) // one successor (latch) latch: %exitcond = or (anyof %earlyexitcond), origexitcond br %exitcond, entry, exit It's just a placeholder and gets immediately removed whenever VPlanPredicator sees it, but allows it to use the correct mask. This makes way for supporting more types of loops, as we could also support stores/divs etc. as long as the exiting block dominates them. See the note in canUncountableExitConditionLoadBeMoved as to why we can't predicate stores when they're before the exiting block. However the main motivation for this to allow us to support tail folding with early exits, which I believe will be needed to make supporting fault-only-first loads simpler: llvm#151300 In order to actually test the changes from this, this PR allows non-dereferenceable loads that are properly dominated by the exiting block in LoopVectorizationLegality. But in practice, something else usually transforms these loops to be multiple-entry which prevents them from being vectorized.

david-arm · 2025-12-19T13:53:05Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

        "Enable vectorization of early exit loops with uncountable exits."));

+static cl::opt<bool>
+    EnableEarlyExitWithFFLoads("enable-early-exit-with-ffload", cl::init(false),


Does it work for other targets besides RISCV? If so, the PR should have some tests for other backends too, or even better in the top level Transforms/LoopVectorize directory.

david-arm · 2025-12-19T13:56:33Z

llvm/test/Transforms/LoopVectorize/RISCV/find.ll

I think we need tests in the top level LoopVectorize directory too. For example, I think it's worth adding an extra RUN line with the -enable-early-exit-with-ffload flag to Transforms/LoopVectorize/single_early_exit.ll and Transforms/LoopVectorize/single_early_exit_live_outs.ll.

Those tests have a very extensive coverage of different CFGs and combinations of live-outs in different scenarios.

- Merge `canonicalizeEVLLoops` and `convertFFLoadEarlyExitToVLStepping` into a single entry point `convertLoopIntoVariableLengthStepping`. Both transforms convert loops to use variable-length stepping after region dissolution. - Generalize poison-safety handling in `adjustFFLoadEarlyExitForPoisonSafety` to mask all AnyOf users of FFLoad (not just a single hardcoded pattern).

github-actions · 2026-01-14T08:17:57Z

⚠️ LLVM ABI annotation checker, ids-check found issues in your code. ⚠️

You can test this locally with the following command:

git diff origin/main HEAD -- 'llvm/include/llvm/**/*.h' 'llvm/include/llvm-c/**/*.h' 'llvm/include/llvm/Demangle/**/*.h'
Then run idt on the changed files with appropriate --export-macro and --include-header flags.

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from ids-check here.

diff --git a/llvm/include/llvm/IR/Type.h b/llvm/include/llvm/IR/Type.h
index 1c042500b..6cfccad75 100644
--- a/llvm/include/llvm/IR/Type.h
+++ b/llvm/include/llvm/IR/Type.h
@@ -14,6 +14,7 @@
 #ifndef LLVM_IR_TYPE_H
 #define LLVM_IR_TYPE_H
 
+#include "llvm/include/llvm/Support/Compiler.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/Support/CBindingWrapping.h"
 #include "llvm/Support/Casting.h"
@@ -234,7 +235,7 @@ public:
   bool isTokenTy() const { return getTypeID() == TokenTyID; }
 
   /// Returns true if this is 'token' or a token-like target type.s
-  bool isTokenLikeTy() const;
+  LLVM_ABI bool isTokenLikeTy() const;
 
   /// True if this is an instance of IntegerType.
   bool isIntegerTy() const { return getTypeID() == IntegerTyID; }

lukel97

I've opened up some thoughts on fault-only-first loads in #175900, but thought I'd mention some things on this PR that I'd like to hear your thoughts on:

fault-only-first loads require variable stepping, which you've converted in this PR. But I think we still need to update induction recipes to account for the fact that they no longer step by VF?
@llvm.experimental.get.vector.length has the property that the step amount may vary on the penultimate iteration, but otherwise the vector loop will always be executed the same number of iterations as a non tail folded loop. i.e. the backedge taken count is always the same.
I don't think this is true of @llvm.vp.load.ff. It's valid for a vle*ff.v to reduce VL to one for every iteration, which means the vector trip count may change. Do we need to go through any users of the vector trip count and check if they need adapted?
We don't predicate any recipes after the vp.load.ff. I think this happens to work at the moment because we currently don't allow any non-speculatable recipes in early-exit loops, but this won't be the case after #172454.
I think it would be good to handle this logic together in this PR, otherwise it will surprisingly break when more non-speculatable recipes are allowed. I think landing #172454 first would allow us to write test cases for this.

arcbbb · 2026-01-16T08:51:55Z

fault-only-first loads require variable stepping, which you've converted in this PR. But I think we still need to update induction recipes to account for the fact that they no longer step by VF?

Good catch! I missed this when porting the patch. I'll add a test for it.

Do we need to go through any users of the vector trip count and check if they need adapted?

Non-tail-folded loops don't need changes because VTC is (N - (N % Step)), and its uses in the exit condition and resume-phi treat it as a cumulative processed count.
Tail-folded loops (not yet supported by this patch) will require adapting users since VTC is rounded up to multiple of VF.

We don't predicate any recipes after the vp.load.ff. I think this happens to work at the moment because we currently don't allow any non-speculatable recipes in early-exit loops, but this won't be the case after [VPlan] Handle early exit loops with non-dereferenceable loads in latch #172454.
I think it would be good to handle this logic together in this PR, otherwise it will surprisingly break when more non-speculatable recipes are allowed. I think landing [VPlan] Handle early exit loops with non-dereferenceable loads in latch #172454 first would allow us to write test cases for this.

I believe #172454 predicates those non-speculatable recipes, so even after it lands, they'll be guarded by masks. We'll still need to gate those masks with the faulting lane.
We could optimize this similar to optimizeMaskToEVL, but I'd prefer to keep the initial fault-only-first load support minimal and address that in a follow-up.

lukel97 · 2026-01-18T07:10:24Z

Non-tail-folded loops don't need changes because VTC is (N - (N % Step)), and its uses in the exit condition and resume-phi treat it as a cumulative processed count.
Tail-folded loops (not yet supported by this patch) will require adapting users since VTC is rounded up to multiple of VF.

Thanks for checking, I got my terminology mixed up though. I think I originally meant to ask about the "vector backedge taken count" but that doesn't seem to be a constant that we materialize in the loop vectorizer. My concern is do we have any behaviour that implicitly relies on the vector backedge taken count always being VTC / VFxUF? We might not have any but it would be good to check.

I believe #172454 predicates those non-speculatable recipes, so even after it lands, they'll be guarded by masks. We'll still need to gate those masks with the faulting lane.

#172454 only predicates lanes off past the early exit, but doesn't predicate any lanes that need to be masked off because vp.load.ff reduced the VL.

… stepping This is groundwork for llvm#151300, which aims to support first-faulting loads in non-tail-folded early-exit loops. Per llvm#175900, we need a variable-length stepping transform that can be shared between EVL and non-EVL loops. The idea is to have an EVL-independent counter and transform for tracking the cumulative number of processed elements. This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and transform (canonicalizeEVLLoops) to be EVL-independent: - Rename VPEVLBasedIVPHIRecipe to VPNumProcessedElementsPHIRecipe to reflect its general purpose of tracking processed element count. - In addExplicitVectorLength, switch the loop from up-counting to down-counting by transforming: (branch-on-count CanonicalIVInc, VTC) to: (branch-on-count (sub VTC, CanonicalIVInc), 0) - Rename canonicalizeEVLLoops to convertToVariableLengthStep and update the exit condition from: (branch-on-count (sub VectorTripCount, CanonicalIVInc), 0) -> (branch-on-count (sub TripCount, (add Step, CumulativeIV)), 0) -> (branch-on-count (sub AVL, Step), 0) if AVL is available.

This is groundwork for llvm#151300, which aims to support first-faulting loads in non-tail-folded early-exit loops. Per llvm#175900, we need a variable-length stepping transform that can shared between EVL and non-EVL loops. The idea is to have an EVL-independent counter and transform for tracking the cumulative number of processed elements. This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and transform (canonicalizeEVLLoops) to be EVL-independent: - Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to reflect its general purpose of tracking processed element count. - Rename canonicalizeEVLLoops to convertToVariableLengthStep.

…ipe (#177114) This is groundwork for #151300, which aims to support first-faulting loads in non-tail-folded early-exit loops. Per #175900, we need a variable-length stepping transform that can shared between EVL and non-EVL loops. The idea is to have an EVL-independent counter and transform for tracking the cumulative number of processed elements. This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and transform (canonicalizeEVLLoops) to be EVL-independent: - Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to reflect its general purpose of tracking processed element count. - Rename canonicalizeEVLLoops to convertToVariableLengthStep. This is NFC.

…ipe (llvm#177114) This is groundwork for llvm#151300, which aims to support first-faulting loads in non-tail-folded early-exit loops. Per llvm#175900, we need a variable-length stepping transform that can shared between EVL and non-EVL loops. The idea is to have an EVL-independent counter and transform for tracking the cumulative number of processed elements. This patch renames the existing counter (VPEVLBasedIVPHIRecipe) and transform (canonicalizeEVLLoops) to be EVL-independent: - Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationRecipe to reflect its general purpose of tracking processed element count. - Rename canonicalizeEVLLoops to convertToVariableLengthStep. This is NFC.

arcbbb requested review from alexey-bataev, david-arm and fhahn July 30, 2025 09:54

Meinersbur mentioned this pull request Jul 30, 2025

[MLIR][OpenMP] Add canonical loop LLVM-IR lowering #147069

Merged

arcbbb requested review from Mel-Chen and lukel97 July 30, 2025 10:04

david-arm requested a review from huntergr-arm July 30, 2025 10:08

arcbbb mentioned this pull request Jul 30, 2025

[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. #128593

Merged

alexey-bataev reviewed Jul 30, 2025

View reviewed changes

arcbbb changed the title ~~[LV] Add support for speculative loads in loops that may fault~~ [RFC][LV] Add support for speculative loads in loops that may fault Jul 31, 2025

arcbbb mentioned this pull request Aug 7, 2025

[LV] Add initial legality checks for loops with unbound loads. #152422

Merged

arcbbb force-pushed the early-exit-dev branch from d044e7a to 1bb79f8 Compare August 20, 2025 07:54

lukel97 reviewed Sep 2, 2025

View reviewed changes

arcbbb force-pushed the early-exit-dev branch from 1bb79f8 to f8d7616 Compare September 21, 2025 01:14

arcbbb changed the title ~~[RFC][LV] Add support for speculative loads in loops that may fault~~ [VPlan] Enable vectorization of early-exit loops with unit-stride fault-only-first loads Sep 21, 2025

arcbbb force-pushed the early-exit-dev branch from f8d7616 to 1ab67ae Compare September 21, 2025 01:42

arcbbb marked this pull request as ready for review September 21, 2025 01:42

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Sep 21, 2025

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Sep 22, 2025

arcbbb force-pushed the early-exit-dev branch from b6add27 to 755ad37 Compare September 22, 2025 07:23

llvmbot added the backend:RISC-V label Dec 16, 2025

lukel97 mentioned this pull request Dec 16, 2025

[VPlan] Handle early exit loops with non-dereferenceable loads in latch #172454

Draft

david-arm reviewed Dec 19, 2025

View reviewed changes

arcbbb force-pushed the early-exit-dev branch from 43bbad6 to adc53d4 Compare January 14, 2026 03:13

lukel97 mentioned this pull request Jan 14, 2026

First-fault/fault-only-first load vectorization #175900

Open

arcbbb force-pushed the early-exit-dev branch from adc53d4 to cd23a06 Compare January 14, 2026 08:07

arcbbb added 5 commits January 14, 2026 00:09

[VPlan] Support struct return types for widen intrinsics (NFC)

4929fea

Support WidenFFLoad in early-exit loop

8e0aa51

Reuse getCostForIntrinsics and remove special handling for vp_load_ff

4324a3d

Rebase code

3d72166

arcbbb force-pushed the early-exit-dev branch from cd23a06 to 3d72166 Compare January 14, 2026 08:15

llvm deleted a comment from github-actions bot Jan 14, 2026

arcbbb mentioned this pull request Jan 14, 2026

[RISCV] Convert vector.reduce.or + cttz.elts to vfirst #175952

Open

lukel97 reviewed Jan 15, 2026

View reviewed changes

arcbbb mentioned this pull request Jan 22, 2026

[NFC][VPlan] Rename VPEVLBasedIVPHIRecipe to VPCurrentIterationPHIRecipe #177114

Merged

lukel97 mentioned this pull request Mar 2, 2026

[VPlan][NFC] Extract addCurrentIterationPhi from addExplicitVectorLength #182650

Closed

lukel97 mentioned this pull request Mar 29, 2026

[IR] Add llvm.masked.load.first.fault intrinsic #156470

Open

Conversation

arcbbb commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arcbbb commented Jul 30, 2025

Uh oh!

github-actions bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-bataev Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

david-arm commented Jul 30, 2025

Uh oh!

arcbbb commented Aug 1, 2025

Uh oh!

lukel97 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

arcbbb Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 21, 2025

Uh oh!

arcbbb commented Sep 21, 2025

Uh oh!

arcbbb commented Dec 16, 2025

Uh oh!

david-arm Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

lukel97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arcbbb commented Jan 16, 2026

Uh oh!

lukel97 commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

arcbbb commented Jul 30, 2025 •

edited

Loading

github-actions bot commented Jul 30, 2025 •

edited

Loading

llvmbot commented Sep 21, 2025 •

edited

Loading

lukel97 left a comment •

edited

Loading